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1.  INTBODUCTION 


During  the  period  14  February  1977-16  February  1977 , 
■eetlnge  were  held  at  the  Naval  Postgraduate  School  (NPS)  to 
discuss  the  TPQ-27  PSVT.  Participating  in  this  meeting  were 
Major  Bari  Peete  (MAD,  Pt.  Mugu) , Major  Dave  Allen  (MCTSSA, 

Camp  Pendleton) , Capt.  Jerry  Paccassi  and  myself  (both  at  NPS) . 
Also  in  attendance  were  Mike  Pa trow  and  Mike  Lowe,  students  at 
NPS.  A test  concept  was  developed  which  called  for  bomb  drops 
with  18  cells  in  a "base  line"  group,  together  with  additional 
"demonstration"  drops,  conducted  under  eight  additional 
combinations  of  conditions.  These  combinations  are  shown  in 
Figure  1.  Within  each  cell  of  the  design  for  baseline  drops,  a 
tost  is  to  be  made  of  whether  contract  specified  CEP's  have 
been  «t. 

In  what  follows,  we  discuss  the  design,  certain  aspects 
of  perforating  the  trials  in  the  field,  and  an  outline  of  the 
Analysis  procedure  pr*  -osed  for  testing  CEP's  and  making  other 
inferences  from  the  test  data,  fioms  of  these  comments  c«ae  out 

• • • 1 v.-« 

of  discussions  at  the  NPS  meeting,  and  others  are  suggestions 
and  observations  by  the  author. 

2.  THE  STATISTICAL  DESIGN 

■'  It  ie  desirable  to  test  the  TPQ-27  over  a mide  range 
of  levels  of  the  variables  involved,  in  order  to  facilitate 
inference  about  performance  characteristics  of  this  system 
and  its  sensitivity  and  response  to  variations  in  drop  conditions. 
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Altitude 
(k-£eet 


Ranges  (m 
20 


Mode 


Auto 

Auto 

Voice  Vector 
Skin  Track 
Auto 

Voice  Vector 
Auto 
Auto 


NOTE i "X"  Denotes  drops  at  500  kts;  ”0"  denotes  350  kts. 


Demonstration  drops 

PGC/STICK:  2 sticks  at  20  mi,  10k  ft,  500  kts 


SEMIAUTO: 

5 

mi. 

20k, 

500 

kts 

20 

mi, 

20k, 

500 

kts 

& 

55 

mi. 

20k, 

500 

kts 

MANUAL: 

20 

mi, 

20k, 

o 

o 

tft 

kts 

WIND  AT  ALT: 

20 

mi. 

20k, 

500 

kts 

55 

mi. 

20k, 

500 

kts 

RDL: 

20 

mi. 

20k, 

500 

kts 

FIOORS  1.  Combination  of  conditions  under  which  drops  are 
planned  in  the  PSVT. 


However,  this  testing,  involving  dropping  bombs  on  an  instru- 
mented range,  is  expensive.  Thus  there  is  also  a conflicting 
desire  to  hold  the  sample  size  as  low  as  possible,  consistent 
with  achieving  reasonable  confidence  ip  the  tests  and  in  the 

f « *'  %j  ' ' *•'  1 * * » 

inferences  to  be  made.  For  this  reason,  a sample  of  baseline 
conditions  was  established,  in  which  most  of  the  drops  are  to 
be  made.  The  baseline  cases  were  selected  so  as  to  cover  a 
fairly  large  portion  of  operationally  realistic  conditions. 

The  data  resulting  from  these  baseline  drops  will  allow  testing 
against  contract  specified  CEP's  in  each  cell,  as  well  as  sub- 
sequent analyses  such  as  testing  whether  there  are  significant 
differences  due  to  the  factors  range,  altitude, range  x 
altitude  interaction,  speed,  mode,  speed  x mode  interaction 
and  speed  * altitude  interaction.  In  addition,  estimates  of 
the  type  and  amount  of  response  to  changes  in  the  main  effects 
(for  Auto  mode)  can  be  made.  For  the  demonstration  cells, 
tests  against  contract  specified  CEP's  can  also  be  made. 

The  nature  of  the  tests  of  CEP  has  not  been  completely 
determined  at  this  time,  but  appears  to  have  been  narrowed 
down  to  several  candidates.  Sequential  testing  within  each 
cell  of  the  design  appears  attractive  because  of  the  expected 
savings  in  numbers  of  bomb  drops.  In  Section  4 below  we  out- 
line  two  possible  sequential  procedures  (called  "sequential 
Rayleigh"  and  "sequential  nonpar ame trie")  as  well  as  two  fixed 

!0  • % V-  v ...  , 

sample  size  procedures  (called  "fixed  Rayleigh"  and  "fixed  non- 
parametric") . Sample  size  characteristics  of  the  sequential 
and  fixed-sample  sise  tents  are  shown  in  Tables  1 and  2. 
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TABLE  2.  Sequential  and  Fixed  Rayleiqh  Characteristics  for  Several  a,  8,  c.  Combinations. 


The  column  heads  in  Tables  1 and  2 are  as  follows: 


a 


0 


CEPj/CEPq 


min  accept 


min  reject 


Low  reject 


: probability  the  test  rejects  HQ:CEP  = CQ 

in  favor  of  H^:CEP  = , when  in  fact  the 

system  has  CEP  « CQ . 

: probability  the  test  accepts  HQ  when  in 

fact  the  system  has  CEP  = C^ 

: ratio  of  minimum  unacceptable  CEP  to  con- 

tract specified  CEP. 

: the  smallest  possible  sample  size  at 

termination  with  acceptance  of  HQ  (i.e., 
the  sample  size  required  to  accept  even  a 
perfect  system) . 

: the  smallest  sample  size  possible  for  re- 

jecting Hq  (for  nonparametric  sequential 
procedure  only — for  the  sequential  Rayleigh 
procedure,  the  min  reject  number  is  1 for 
all  cases).  NOTE:  for  the  nonparametric 
case,  round  up  to  integer  values  where 
necessary. 

: the  sample  size  required  for  rejection 

if  all  radial  misses  fell  at  distance  CEP^ 
from  the  target  (for  sequential  Rayleigh 
only) . 


Max  E(N) 


Typical  N 


N fixed 


slope 


Accept  intercept  : 


Reject  intercept: 


Max  3on  : 


the  worst  case  expected  sample  size  for 
the  sequential  procedures  (this  occurs  for 
some  true  system  CEP  between  CEP^  and  CEP^) . 

average  of  sequential  tests  expected  shmple 
sizes  under  HQ  and  under  H^. 

sample  size  required  by  the  fixed-sample 
size  procedures. 

slope  of  lines  forming  boundaries  of  the 
continuation  region  for  sequential  procedures. 

the  y-intercept  of  the  boundary  line  defining 

*»'  \\  .1 

the  accept  region  for  sequential  procedures. 

the  y-intercept  of  the  boundary  line  defining 
the  reject  region  for  sequential  procedures. 
NOTE:  for  the  sequential  nonparametric 

procedures,  the  y-intercepts  are  symmetric 
if  a ■ 8;  otherwise  the  x-intercept  pf 
the  rejection  line  is  given  under  "Min  reject." 

three  times  the  max  E(N).  This  is  roughly 

• • 1 • • ' t J-  • r-j 

two  standard  deviations  above  the  expected 
sample  size— virtually  none  of  the  tests 
should  continue  beyond  this  value. 


7 


The  values  shown  in  Tables  1 and  2 pertaining  to 


sequential  tests  were  obtained  using  Walds'  approximations, 
and  are  therefore  slightly  conservative.  Exact  stopping  bounds 
are  available  for  these  tests  (for  example  those  prepared  by 
Leo  A.  Aroian  at  TRW  Systems,  Redondo  Beach,  California) , and 
they  should  be  used  if  the  sequential  approach  to  testing  CEP 
is  adopted.  Truncation  of  the  sequential  test  was  considered, 
but  it  appears  undesirable  for  several  reasons:  1)  truncation 

increases  average  sample  sizes,  2)  truncation  complicates  the 
computation  of  acceptance  and  rejection  bounds  (although, 
again,  tables  mpy  be  available  covering  most  of  our  cases), 
and  3)  the  terminal  decision  for  cases  reaching  the  truncation 
point  is  somewhat  arbitrary.  In  addition,  for  the  a,  0, 
CEPj/CEPq  combinations,  we  can  realistically  anticipate 
(see  Tables  1 and  2 with  a and  0 on  the  order  of  0.10  and 
CEP^/CEPg  about  2,  for  example),  max  3oN  (which  is  essentially 
an  upper  bound  on  sample  size  N)  is  not  unacceptably  large, 
in  view  of  the  fact  that  over  the  many  cells  of  the  design, 
with  an  individual  sequential  test  being  performed  in  each 
cell,  the  overall  average  sample  size  per  cell  will  almost 
certainly  fall  below  max  E(N).  Consequently,  it  is  felt  that 
trunction  would  only  cause  unnecessary  increase  in  overall 
drop  requirements  for  the  entire  test  sequence. 

An  alternative  to  untruncated  sequential  testing  is 
to  use  fixed  sample  size  tests.  This  has  the  effect  of 
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balancing  the  number  of  drops  in  the  various  cells  of  the 
design  matrix,  which  is  desirable  for  the  subsequent  analyses 
concerning  differences  due  to  the  various  factors.  However, 
as  with  truncation  of  the  sequential  procedures,  the  overall 
sample  size  requirements  are  larger  for  fixed  sample  size  tests 

c ' , 

It  is  our  feeling  that  the  balance  in  design  achieved  by  fixed 
sample  size  testing  is  far  outweiqhed  by  its  disadvantage  with 
respect  to  overall  sample  size  requirement.  As  is  discussed 
in  the  succeeding  section,  the  way  in  which  the  field  tests 
may  be  carried  out  will  tend  to  balance  the  design  even  with 
sequential  testing  in  each  cell,  and  this  further  points  to 
superiority  of  using  sequential  testing. 


In  order  to  avoid  loosing  efficiency  in  the  PSVT,  it 
is  desirable  that  drops  be  conducted  in  such  a way  as  to  avoid 
(as  much  as  possible)  confounding  factors  suspected  to  affect 
system  performances,  and  to  provide  "insurance"  against  bias 

>:  ...  >•''  ' - ' - . : ' ' • ,’V 

in  results  due  to  unknown  causes . Ideally  this  would  be  in 
part  accomplished  by  scheduling  individual  drops  over  the 
various  cells  of  the  design  using  a formal  randomization  pro- 
cedure... This  might  mean,  for  example,  that  a single  flight 
(operation)  would  call  for  first  dropping  a bomb  at  300  kts, 
20k  ft  altitude  at  20  mi  range  in  Auto  mode,  next  dropping  a 
bomb  at  500  kts,  10k  altitude  at  55  mi  range  in  Auto  mode. 
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and  so  on  for  the  remaining  bombs  to  be  dropped  in  this 
operation.  Clearly  such  a schedule  may  not  be  practical,  so 
constraints  must  be  imposed  on  the  scheduling  process.  The 
author  is  not  in  a position  to  assess  what  constraints  are 
necessary,  but  he  wishes  to  point  out  the  desirability  of 
imposing  as  little  constraint  as  possible. 

In  order  to  gain  appreciation  of  the  possible  effects 
of  confounding  mentioned  above,  consider  an  example  test 
schedule  in  which  the  first  group  of  operations  are  all  conducted 
at  500  lets,  10k  altitude,  20  mi  range,  Auto  mode.  These  drops 
might  be  followed  by  operations  all  at  500  kts,  20k,  20  mi,  auto, 
etc.  Suppose,  moreover,  each  individual  operation  (consisting 
of  eight  bombs)  is  constrained  such  that  all  eight  bombs  are 
dropped  under  the  same  conditions  (in  the  same  cell  of  the 
design) . Then  factors  having  to  do  with  each  individual 
operation  (such  as  radar  alignment,  pilot  effect,  wind  profile 
errors,  etc.),  whose  effects  for  the  given  operation  may  be 
unknown  or  only  partially  known  (even  using  AXIS) , cannot  be 
"balanced  out";  rather  they  may  cause  bias  of  an  amount  un- 
determinable by  the  experimenter  and  analyst.  Simiarly,  con- 
ducting operations  all  with  fixed  combinations  of  speed, 
range,  etc.  close  together  in  time  would  preclude  balancing 
out  unknown  long  term  trend  effects  (if  any) . 

There  is  another  reason  why  allowing  drops  in  different 
cells  in  a single  operation  would  be  desirable.  If  a sequential 
test  plan  is  adopted  for  CEP  testing,  forcing  observations  to 
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be  made  in  batches  of  eight  (say)  in  a given  cell  of  the 

' - • , » / ,\ 

design  rather  than  one  at  a tine  (i.e.,  no  closer  together 
in  time  than  the  miss  distance  determination  turnaround  time) 
will  generally  lead  to  larger  than  necessary  sample  sizes — 
perhaps  substantially  larger.  As  a rough  assessment  of  the 
effect  of  such  "batch"  testing  relative  to  ordinary  sequential 
testing,  consider  the  nonparametric  sequential  test  with 
a » 6 * .1  and  CEP^/CEPg  * 2.  Then  the  "typical"  expected 
sample  size  is  about  6.3.  Imagine  for  the  moment  sample  size 
N is  roughly  exponentially  distributed  (which  is  certainly 
an  oversimplification  but  is  consistent  with  the  observation 
that  in  many  cases  the  mean  and  standard  deviation  of  N are 
about  the  same,  and  is  adequate  for  the  present  discussion) . 
With  batches  of  size  eight,  one  batch  would  be  required  with 
a probability  on  the  order  of  .7,  two  batches  with  probability 
about  .2  and  three  batches  with  probability  roughly  .1.  Thus 
the  expected  number  of  batches  required  would  be  about  1.4, 
or  roughly  11  drops  per  cell  on  the  average.  Thus  the  effect 
of  batch  arrivals  of  observations  in  each  cell  is  an  increase 
in  total  drops  for  the  experiment,  perhaps  by  as  much  as  75%. 

In  summary,  the  implication  of  the  foregoing  discussion 
is  that  it  may  well  be  worth  expending  test  resources  to  allow 
individual  drops  in  more  than  one  cell  within  a given  operation. 
In  addition,  variables  such  as  aircraft  heading,  time  of  day, 
order  within  the  overall  test  sequence,  weather,  etc.  should 
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"be  varied”  as  much  as  practicable  within  a given  cell  of  the 
design  (that  is,  have  as  many  variations  and  combinations  of 
levels  as  practical  associated  with  the  drops  in  each  given 
cell).  This  may  be  viewed  as  "buying  insurance”  against 
unforeseen  effects  of  unknown  causes  in  the  experiment;  in 
addition,  such  an  approach  may  allow  deduction  of  probable 
causes  of  system  misbehavior  in  some  cases  of  importance,  should 
such  difficulty  be  experienced  in  the  PSVT. 

Final  comments  on  the  field  conduct  that  the  author 
would  like  to  mention  are  that  there  should  be  no  possibility 
of  specialized  ”tweeking"  of  the  system  (by  either  test 
personnel  or  the  contractor)  to  alter  its  performance  in  any 
way  for  the  tests.  This  may  involve  careful  monitoring  of  any 
software  changes,  for  example.  Secondly,  if  the  sequential 
approach  to  CEP  testing  is  to  be  adopted,  there  should  be  a 
mechanism  for  assessing  each  drop  miss  distance  (or  hit-miss 
ourcome)  in  a period  of  time  which  is  short  relative  to  the 
following  time  interval  standards.  If  individual  drops  are 
continued  within  a cell  with  a given  operation  only  until 
sequential  termination,  the  standard  is  the  operation  duration 
(hours?) . If  batch  testing  is  used  within  each  cell  of  the 
design,  the  standard  would  be  time  between  operations  (days?) . 

If  the  individual  drops  within  each  operation  are  allocated 
to  various  cells  of  the  design  (which  I recommend  if  at  all 
possible) , the  standard  is  the  time  spent  at  a given  range 


12 


(weeks?) . Thus  in  the  latter  case  there  is  perhaps  not  a 
Measurement  "turn-around  time"  problem  at  all,  an  additional 
bonus  in  taking  this  approach. 

4.  STATISTICAL  ANALYSIS  PLAN 

There  are  two  levels  of  analysis  in  the  PSVT  plan. 

Th®  primary  goal  is  to  test  whether  system  performance  in  each 
cell  of  the  design  is  within  design  specifications.  The  secondary 
analyses  concern  determining  which  factors  have  significant 
effect,  and  what  the  effects  are. 

For  the  primary  tests  of  CEP,  there  appear  to  be 
several  alternatives:  sequential  parametric  test  (SPT) , sequential 

nonpar ametric  test  (SNT) , fixed-sample  size  parametric  test 
(FPT)  and  fixed-sample  size  nonparametric  test  (FNT) . The 
SPT  and  SNT  are  discussed  in  an  earlier  report  [1]  and  we  thus 
give  only  a very  brief  comment  on  them  here.  The  FPT  and  FNT 
•re  discussed  below.  All  of  the  tests  involve  testing  whether 
the  system  displays  accuracy  (in  each  given  cell  of  the  design) 
to  within  the  contract  specified  CEP,  say  CEPQ,  or  whether  it 
has  performance  worse  than  some  minimally  acceptable  performance 
(CEP^) . Thus  the  tests  may  be  developed  as  tests  of  HQ:M  - CEPQ 
vs  HgtM  ■ CEP^,  where  N denotes  the  true  (population)  median 
radial  miss  distance  of  the  system  under  the  condition*  of 
the  given  cell  of  the  design.  Both  of  the  sequential  test  pro- 
cedures are  applications  of  Wald's  Sequential  Probability  Ratio 
Test  (SPRT) . One,  the  SPT,  is  based  on  sequentially  observing 
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(within  each  cell  of  the  design)  observed  radial  miss 
distances,  and  assuming  a Rayleigh  distribution  model.  The 
SNT  is  based  on  observing  only  whether  each  drop  falls  within 
CEP Q and  assuming  a binomial  distribution  model.  The  SPT 
requires  smallest  average  sample  size,  but  its  validity  depends 
on  whether  the  Rayleigh  assumption  is  tenable  (the  latter 
assumption  is  implied  by  the  assumption  impact  on  the  target 
plane  follow  a circular  normal  distribution,  for  example) . 

The  SNT  requires  somewhat  larger  samples  on  the  average  than 
does  the  SPT,  but  the  binomial  model  involved  is  far  less  open 
to  criticism  on  the  grounds  of  invalidity  due  to  assumption  of 
distribution  of  radial  miss  distance. 

The  fixed  sample  size  procedures  are  also  based  on 
the  respective  stochastic  models  (Rayleigh  and  binomial) . If 
we  consider  the  equivalent  hypotheses  about  median  squared 
radial  miss  distance  and  measure  the  squared  radial  miss  distance 
of  each  drop,  the  Rayleigh  model  transforms  to  a chi-squared 
model  which  is  somewhat  more  tractable  computationally.  In 
what  follows  we  describe  the  FPT  in  these  terms. 

2 

Suppose  R is  distributed  Rayleigh  so  R is  dis- 

tributed exponential  with  mean  C2/ln  2,  where  C2  is  squared  CEP. 
The  likelihood  ratio  test  of  H0«sttdian(R2)  - C2  vs 

Hi median (R2)  - c?  is  based  on  the  test  statistic  T * f R? . 

• 1 i-1  1 

Hq  is  rejected  iriienever  the ^ calculated  T is  sufficiently 

large.  Under  Hq,  [(2  In  2)/C2]T  is  distributed  chi-squared 
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with  2N  degrees  of  freedom 
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Thus  the  PPT  procedure  for  each  cell  of 

N radial  miss  distances,  R, , R_ , ...  , 

N ±4 

sum  of  squares,  T - £ R?.  Reject 

i*l  1 w 


_I0  2 

i in  2 x(l-o;2N)  ' 


the  design  is  to  observe 
Rg.  Calculate  the 
if  T exceeds 


"h«r«  X(j-0>2*)  i#  (1“«)100*  point  in  the  X^2n)  tables. 

Por  example,  with  C*  - 1 (i.e.,  measured  in  CQ-units) , 

N ■ 9 and  a - 0.10,  this  critical  value  is  18.747. 

The  PUT  is  a test  of  hypotheses  about  a binomial 
parameter  p;  HQ:p  > 1/2  vs  Ha:p  < 1/2,  where  p represents 
the  probability  an  impact  falls  within  the  contract  specified 
CEP,  say  CEPq.  Assuming  independence  among  the  bomb  impacts 
(see  comments  in  Section  3 above) , the  number  X of  "hits” 
(impacts  with  R^  < CEPq)  in  N drops  is  binomially  distri- 
buted; further,  under  the  null  hypothesis  it  is  binomial 
with  parameter  1/2  (X  ~ b(N,l/2) ) . The  null  hypothesis  should 
be  rejected  if  the  observed  value  of  X is  on  or  below  b 

Ct,N 

where  b H is  the  largest  value  (obtained  from  the  b(N,l/2) 
tables)  such  that  P[X  < ba  NJ  < a.  Por  example,  with 

IS 


a « 0.10  and  N * 12,  this  critical  value  is  3.  Note:  due 

to  the  discreteness  of  the  binomial  distribution,  this  pro- 
cedure is  somewhat  conservative,  in  that  the  actual  type-1 
error  probability  for  this  example  is  .073,  rather  than  the 
desired  value,  0.10.  If  an  exact  test  is  desired,  a randomized 
decision  rule  can  be  used  (see  E.  Lehmann  [3]  for  details) . 

The  tests  of  CEP  within  each  cell,  discussed  above, 

constitute  the  primary  goal  of  the  PSVT.  Secondary  goals 

include  analyses  of  effects  of  various  factors  included  in 

the  design.  An  analysis  of  variance  (AOV)  is  planned,  using 

data  from  the  baseline  trials.  These  types  of  cells  in  the 

design  received  relatively  greater  numbers  of  drops,  and  form 

a factorial  arrangement  (with  some  unbalance  in  sample  sise) . 

It  is  anticipated  that  the  analysis  of  variance  will  be  based 

on  (log  R^)  data,  the  log  transformation  being  used  to 

stabilise  variance  over  the  cells,  a condition  required  in 

analysis  of  variance.  To  see  the  appropriateness  of  this 

transformation,  consider  the  type  of  distribution  that  is 

likely  to  be  sampled  through  observing  radial  miss  distances 

Rl'  *2'  **•  ' *m  a cell  of  the  design.  We  anticipate 

that  R2  k*X(2)'  ®eans  "approximately  distributed 

2 

as"  and  k is  a constant  proportional  to  CEP  . Then 
8{R2)  m 2k  and  V(R2)  * 4k2  so  the  standard  deviation  in  a 
given  cell  is  approximately  proportional  to  the  mean,  i.e., 
o ■ ky  - h(|i),  where  h is  linear.  Then  the  transformation 
g given  by 


g(r2)  - f — iy-  dr2  - dr2  - In  r2 

J Mr2)  J r2 

is  commonly  used  to  make  o constant  over  varyin9  values  of 
W (see  Curtiss  12],  for  example).  But  in  r2  « In  r,  hence 
analysis  of  variance  can  be  performed  on  log  RA  data.  Appro- 
priateness of  this  transformation  can  be  assessed  once  the 
experimentation  data  are  available. 

If  the  speed  and  altitude  levels  actually  attained  in 
the  trials  vary  substantially  (say  more  than  10%)  from  the  levels 
specified  in  the  design  matrix,  one  or  both  of  these  factors 
may  be  incorporated  as  covariates  in  am  Analysis  of  Covariance 
(AOC) , rather  than  the  analysis  of  variance  described  above. 
Again,  determination  of  whether  this  is  necessary  or  desirable 
can  be  made  once  the  experimentation  data  are  available.  Por 
this  purpose,  the  data  arising  from  each  drop  should  be  in  a 
format  which  includes  measured  values  of  speed  and  altitude. 

In  addition  to  the  AOV  or  AOC,  secondary  analysis  may 
include  fitting  a response  surface  to  the  observed  drop  data. 

This  could  be  done  using  regression  (perhaps  weighted  to 
accommodate  inhomogeneity  of  variance)  to  estimate  a surface 
giving  system  accuracy  as  a function  of  the  variables  altitude 
range  and  poesibly  speed,  for  the  system  in  the  Auto  mode. 

in  the  model  should  be  selected  so  known  and  anticipated 
physic si  system  characteristics  and  target/range  characteristics 
are  likely  to  be  adequately  represented.  Although  the  dependent 
variable  could  be  taken  to  be  sample  CSP  in  each  cell,  a better 
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model  might  result  from  modeling  squared  radial  miss  distance 
via  the  regression,  then  transforming  predictions  with  this 
model  to  CEP  predictions,  if  desired,  using  the  Rayleigh-based 
relationship. 

Finally,  additional  analyses  (such  as  pairwise  compari- 
sons, cell  CEP  estimates,  patterns  of  trial  "aborts"  and 
"outlier"  rejections,  etc.)  and  presentations  of  summary  data 
should  be  undertaken.  The  precise  nature  of  these  analyses 
has  not  been  explored  as  yet,  and  to  a large  extent  will  depend 
on  the  data  obtained.  Close  coordination  with  test  personnel 
should  also  be  maintained  by  the  analyst,  in  order  to  assist 
in  determining  what  additional  analyses  would  be  appropriate. 

It  is  planned  to  use  the  ARIS  system  to  assist  in 
determining  causes  for  observed  large  misses.  This  procedure 
constitutes  an  "outlier"  rejection  rule,  which  could  bias  the 
experiment,  as  follows.  If  only  large  miss  drops  are  subjected 
to  the  ARIS  screening,  the  overall  effect  will  be  to  possibly 
eliminate  some  of  the  large  misses,  which  in  turn  makes  the 
remaining  drops  appear  more  accurate.  Such  screening  may  be 
appropriate;  however,  we  suggest  two  actions  which  may  assist 
in  determining  whether  biasing  has  occurred.  First,  records 
of  any  such  eliminated  drops  should  be  kept,  for  possible 
subsequent  analysis.  Second,  the  ARIS  screen  should  be  applied 
formally  to  a sample  of  "good"  drops,  using  the  same  rejection 
criteria  as  for  the  outlier  cases.  Records  should  be  kept 
of  the  results  of  such  screening  of  "good"  drops.  These  can 
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be  used  to  help  assess  the  degree  of  bias  that  may  have  been 
induced  by  elimination  of  drops  with  large  miss  distances 
that  were  not  actually  outliers. 

With  the  large  number  of  individual  tests  being 
performed  with  the  primary  analysis  (i.e.,  one  CEP  test  in 
each  cell  of  the  design) , it  is  likely  that  there  will  be  a 
mixture  of  rejections  and  acceptances  of  the  contract  specified 
CEP's.  There  will  occur,  therefore,  the  problem  of  making  an 
overall  assessment  of  whether  the  system  is  sufficiently 
accurate.  It  would  be  a good  idea  to  explore  this  problem  with 
the  decision  maker,  and  to  indicate  how  changing  the  Type  I 
and  Type  II  error  rates  (a  and  0,  respectively)  affect  the 
accept/reject  patterns  that  may  be  encountered.  Perhaps  the 
significance  of  the  observed  number  of  rejections  can  be 
assessed  in  terms  of  physical  explanation  of  system  patterns, 
as  well  as  the  conditions  anticipated  in  actual  operational 
use  of  the  system.  The  binomial  distribution  may  be  of  some 
use  in  determining  whether  the  number  of  rejections  is 
significant,  or  perhaps  Fisher's  method  [4]  of  combining  experi- 
mental results  can  be  used. 


It  should  be  borne  in  mind  that  theoretically  the 
secondary  analyses  may  be  affected  by  the  stopping  rule  used 
in  the  primary  tests.  If  sequential  tests  are  used  for  the 
primary  analysis,  the  data  in  each  cell  are,  in  a mild  sense 
conditional,  given  the  data  obtained  led  to  acceptance  or 


rejection,  as  the  case  nay  be.  It  is  not  anticipated  that 
this  simultaneous  inference  effect  will  be  great  enough  to 
cause  difficulties  from  a practical  point  of  view,  however. 

5.  A SAMPLE  SIZE  REDUCTION  METHOD 

We  have  argued  elsewhere  [1]  that  the  major  shortcoming 
of  the  Rayleigh  model  for  unguided  weapon  misses  is  that  in 
some  applications  it  fails  to  adequately  fit  the  upper  tail 
of  the  miss  distance  distribution.  Even  in  such  cases,  however, 
the  model  may  provide  useful  results  for  the  major  portion  of 
the  miss  distribution  short  of  the  very  large  misses.  In  what 
follows  we  describe  such  an  application  of  the  Rayleigh  distri- 
bution to  reduce  sample  size  required  in  the  primary  analyses 
concerning  CEP  testing.  This  approach  is  applicable  to  both 
the  SNT  and  FNT.  Throughout,  we  assume  the  Rayleigh  model 
provides  reasonable  fit  to  the  radial  miss  distribution  except 
possibly  for  the  upper  tail  region  (which  we  define  here  as 
the  set  of  points  larger  than  the  upper  95%  point  in  the 
Rayleigh  distribution) . 

Suppose,  then,  under  fixed  conditions  the  squared 
radial  miss  cumulative  distribution  function  is 

P ,(y)  - 1 - exp(-y  In  2/C2),  y > 0 , 

R* 

2 2 2 
where  C is  the  median  of  R (i.e.,  C is  the  square  of 
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the  system  CEP) . Let  Cg  denote  the  squared  CEP  under  the 
null  hypothesis  HQ:CEP  ■ CQ  and  assume  the  alternative 
hypothesis  is  H^sCEP  * * kCg.  For  convenience  in  notation, 

assume  miss  distances  are  measured  in  Cg-units,  so  CQ  = 1, 
and  k represents  the  Cj/Cg  ratio.  Recall  both  the  SNT 
and  FNT  are  based  on  the  binomial  distribution  of  the  number 
of  hits  inside  a circle  of  radius  1 (=  CQ) . Under  the  null 
hypothesis  the  probability  of  hitting  this  circle  is 

F ,(y)  « 1 - exp (-  An  2/12)  - 0.5 
and  under  the  probability  of  such  a hit  is 

F ,(1)  » 1 - exp(-  An  2/k2)  . 

R* 

For  example,  with  k - 2 this  probability  is  1 - exp (An  2/2) 
s .2929. 

The  basic  idea  we  wish  to  discuss  is  that  of  allowing 

». 

the  definition  of  "hit"  to  be  associated  with  circles  of  radii 
possibly  different  from  CQ.  We  shall  show  that  even  though 
we  maintain  the  null  and  alternate  hypotheses  about  CEP 
described  above,  the  binomial  data  to  test  these  hypotheses 
can  be  made  be  far  more  efficient  by  defining  the  hit/miss 
criterion  differently.  Let  pQ(C)  denote  the  probability 
under  Hg  of  observing  a miss  distance  within  C units  of 
the  target,  and  similarly  let  p1(C)  denote  that  probability 
under  Ha.  Then 
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. 2 
P0(c)  * P[R2  < C2|CEP  - 1J  - 1 - exp(-C2  in  2)  * 1 - 2~C 

and 

P1(C)  = P[R2  < C2|CEP  « k]  » 1 - exp(-C2  in  2/k2) 

- 1 - 2“c2/k2  - 1 - (1  - P0(C))1/lc2 

We  wish  to  determine  C so  as  to  minimize  the  sample  size  N 
(or  in  the  sequential  case/  Expected  sample  size)  required  to 
achieve  a test  of  Hq  vs  with  preselected  operating 
characteristics  a and  0.  Our  procedure  is  to  express  N 
as  a function  of  C,  then  minimize.  For  ease  of  presentation 
we  use  the  arcsine  transformation  of  binomial  random  variables 
to  normality  [2],  and  limit  ourselves  to  the  Fixed  sample 
size  case  (although  neither  of  these  conveniences  is  necessary) . 

With  some  radius  C of  the  hit  circle  definition, 
the  test  of  Hq  vs  Ha  would  be  based  on  X,  the  observed 
relative  frequency  of  hits.  The  null  hypothesis  is  rejected 
for  X sufficiently  small.  For  any  selected  value  of  C, 
let  p (C ) denote  the  corresponding  probability  an  individual 
bomb  results  in  a hit.  For  even  moderate  values  of  N, 

Y - 2 sin"1  Jli  ~ N(2  sin"1  /p,  |) 

although  the  approximation  may  be  quite  rough  if  p is 
"extreme"  (outside  the  interval  (.05,  .95)  or  so).  Mote:  the 
angle  2 sin"1/7  is  measured  in  radians.  Now,  in  terms  of 
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the  teet  statistic  Y,  because  2 sin'*1  /•*  is  monotone 
increasing,  HQ  should  be  rejected  if  Y < d,  where  the 

critical  value  d and  sample  sise  N are  selected  so  that 

the  desired  sise  and  power  are  attained: 

• • . f ‘ a.  I „ v * ' s * t Jt.it.-  ■ . . . r 

P[Y  < d|C  « CQJ  « a , 

P[Y  < d|C  - Cj]  - 1 - 0. 


Using  the  arcsine  transformation  described  above,  these  conditions 
aet  (at  least  to  good  approximation)  provided 


d - 2 sin-1  /p^  - za//N  , 

i •*  ‘ ■ . . ' ' . . • • ' )■'  ’■  ' ‘ . . .*  . :rtr  - 

d - 2 sin"1  /p^  - z1_e//N  , 

» 1 ' ' ■ * *»  i"1  • DSCi  ?•  -3i)  '-*J.  U-  T * .L  *1  it  '•.».*  f>\.-  . 

where  z^  is  the  6^  quantile  of  the  standard  normal  distri- 
bution* Thus  in  order  to  minimise  N subject  to  meeting 
the  a and  6 requirements  it  suffices  to  maximize 


f (p0)  - sin"1  - sin"1  /p^ 

I 1 ‘ 5 

- sin"1  - sin"1  Jl  - (l-p0) 1A 


' aw*-:.  H v ‘ : qjn.  ■ - j o^a  t,-  : v*  : ; -n  .<!>  . 

This  is  easily  done  for  various  fixed  values  of  the  CEP./CBPft 

AW 

ratio  k.  Values  of  f(pQ)  can  be  used  to  estimate  PUT 
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n 

sample  size  requirements  through  the  approximation 


As  an  example  to  demonstrate  this  idea,  suppose 
a * 6 ■ 0.10,  k * 2.  Then  values  of  f(pQ),  the  radius  C 
of  the  "hit"  circle,  and  approximate  sample  sizes  for  the  PNT 
are  as  shown  in  Table  3. 

The  maximum  of  f(pQ)  occurs  at  Pq  s *94  and  this 
theoretically  minimizes  N.  Note,  however,  that  pQ  = .90 
yields  the  same  savings  in  sample  size  and  has  the  advantage 
of  not  involving  the  model  so  far  into  the  upper  tail  as  does 
the  sample  size  minimizing  value,  .94.  Note  the  sample  size 
requirement  with  Pq  B *90  is  substantially  below  the 
C = CEPq  defined  "hit"  circle  described  in  Section  4 in 
connection  with  the  SNT  and  PNT,  where  pQ  * .50.  The  relative 
reduction  in  approximate  sample  size  requirements  for  the 
example  discussed  above  are  shown  in  Table  3. 

As  mentioned  above,  this  sample  size  reduction  scheme 
can  be  used  for  both  the  SNT  and  PNT,  although  only  the  PNT 
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6.  RECOMMENDATION 


Baaed  on  the  information  available,  the  following 

approach  to  the  PSVT  is  recommended:  use  the  SNT  for  primary 

CEP  testing,  possibly  with  reduction  in  E (N)  using  the 

method  described  in  Section  5.  However,  if  this  reduction 

scheme  is  adopted,  the  definition  of  the  "hit”  circle  should 

not  be  allowed  to  involve  pQ  values  too  extreme  (i.e., 

2 

C values  too  far  in  the  upper  tail  of  the  Rayleigh  distri- 
bution) . Probably  a reasonable  upper  bound  for  pQ  is  .90. 

The  tests  should  be  conducted  so  as  to  deliver  indi- 
vidual drops  in  each  cell  of  the  design  on  different  days,  to 
the  extent  possible.  Drops  should  be  made  in  each  cell  so 
that  Uncontrolled  variables  (such  as  day,  time  of  day,  heading, 
pilot,  aircraft,  weather,  etc.)  vary  over  as  wide  a span  as 
practicable.  As  pointed  out  in  the  preceding,  this  approach 
yields  the  following  advantages:  (1)  it  gives  observations 

which  are  more  nearly  independent;  (2)  it  provides  estimates 
of  CEP  which  are  more  realistc;  (3)  it  avoids  the  increase 
in  sample  size  with  batch  testing;  and  (4)  it  may  give  more 
time  to  measure  miss  distances. 

Secondary  analyses  of  the  radial  miss  data,  including 
(but  not  limited  to)  analysis  of  variance,  analysis  of  co- 
variance,  and  multiple  regression  should  be  performed. 
Transformations  to  stabilize  variance  and  weighted  regression 
should  be  used  if  the  data  suggest  there  is  lack  of  homogeneity 
of  variance. 
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